Simultaneous set-wise testing under dependence, with applications to genome-wide association studies

نویسندگان

  • Wei Wang
  • Zhi Wei
  • Wenguang Sun
چکیده

We consider the problem of identifying diseaseassociated genomic regions in genome-wide association studies (GWAS). It is shown that conventional single SNP analysis can be greatly improved by (i) exploiting the spatial dependency and (ii) conducing set-wise analysis. The SNP set association problem can be conceptualized as the problem of simultaneously testing a large number of sets of hypotheses. We use hidden Markov models to exploit the linkage disequilibrium information in GWAS data, based on which a data-driven screening procedure (GLIS) is proposed. GLIS is shown to be optimal in the sense that it has the smallest missed set rate (MSR) among all valid false set rate (FSR) procedures. The numerical results demonstrate that the proposed procedure controls the FSR at the desired level, enjoys certain optimality properties and outperforms conventional combined p-value methods. We apply the GLIS procedure to analyze a Type 1 diabetes (T1D) GWAS dataset for detecting T1D associated genomic regions. The results show that our proposed SNP set analysis not only provides better biological insights, but also increases the statistical power by pooling information from different samples.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Genome-wide Association Study to Identify Genes and Biological Pathways Associated with Type Traits in Cattle using Pathway Analysis

Extended Abstract Introduction and Objective: Type traits describing the skeletal characteristics of an animal are moderately to strongly genetically correlate with other economically important traits in cattle including fertility, longevity and carcass traits. The present study aimed to conduct a genome wide association studies (GWAS) based on gene-set enrichment analysis for identifying the ...

متن کامل

Graphical-model Based Multiple Testing under Dependence, with Applications to Genome-wide Association Studies

Large-scale multiple testing tasks often exhibit dependence, and leveraging the dependence between individual tests is still one challenging and important problem in statistics. With recent advances in graphical models, it is feasible to use them to perform multiple testing under dependence. We propose a multiple testing procedure which is based on a Markov-random-field-coupled mixture model. T...

متن کامل

Optimal High Dimensional Multiple Testing Under Linear Models

High dimensional multiple testing has many important applications. Motivated by genome-wide association studies (GWAS), we consider the problem of mulitiple testing under high dimensional sparse linear model in order to identify the genetic markers associated with the trait of interest. The model is an extension of the normal mixture model under arbitrary dependence. We propose a multiple testi...

متن کامل

Estimating False Discovery Proportion Under Arbitrary Covariance Dependence.

Multiple hypothesis testing is a fundamental problem in high dimensional inference, with wide applications in many scientific fields. In genome-wide association studies, tens of thousands of tests are performed simultaneously to find if any SNPs are associated with some traits and those tests are correlated. When test statistics are correlated, false discovery control becomes very challenging u...

متن کامل

Statistical Methods for Genome-wide Association Studies and Personalized Medicine

In genome-wide association studies (GWAS), researchers analyze the genetic variation across the entire human genome, searching for variations that are associated with observable traits or certain diseases. There are several inference challenges in GWAS, including the huge number of genetic markers to test, the weak association between truly associated markers and the traits, and the correlation...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010